Phonetization of Arabic: rules and algorithms

نویسنده

  • Yousif A. El-Imam
چکیده

One approach to the transcription of written text into sounds (phonetization) is to use a set of welldefined language-dependent rules, which are in most situations augmented by a dictionary of exceptional words that constitute their on rules. The process of transcribing into sounds starts by pre-processing the text into lexical items to which the rules are applicable. The rules can be segregated into phonemic and phonetic rules. Phonemic rules operate on the graphemes to convert them into phonemes. Phonetic rules operate onto the phonemes and convert them into phones or actual sounds. Converting from written text into actual sounds and developing a comprehensive set of rules for any language is marked by several problems that have their origins in the relative lack of correspondence between the spelling of the lexical items and their sound contents. For standard Arabic (SA) these problems are not as severe as they are for English or French but they do exist. This paper presents a detailed investigation into all aspects of the phonetization of SA for the purpose of developing a comprehensive system for letter-to-sound conversion for the standard Arabic language and assessing the quality of the letter-to-sound transcription system. In particular the paper deals with the following issues: (1) investigation of the spelling and other problems of SA writing system and their impact on converting graphemes into phonemes. (2) The development of a comprehensive set of rules to be used in the transcription of graphemes into phonemes and (3) investigations of the important contextual phonetic variations of SA phonemes so as to determine viable variants (phones) of the phonemes. (4) The development of a set of rules to be used in the transcription of phonemes into phones. (5) The formulation of the rules for grapheme to phoneme and the phoneme to phone transcriptions into algorithms that lend themselves to computer-based processing. (6) An objective evaluation of the performance of the process of converting SA text into actual sounds. Computer Speech and Language xxx (2003) xxx–xxx COMPUTER SPEECH AND LANGUAGE www.elsevier.com/locate/csl * Tel.: +971-6-5050964; fax: +971-6-5585191. E-mail address: [email protected]. 0885-2308/$ see front matter 2003 Published by Elsevier Ltd. doi:10.1016/S0885-2308(03)00035-4 ARTICLE IN PRESS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Orthographic Transcription: which enrichment is required for phonetization?

This paper addresses the problem of the enrichment of transcriptions in the perspective of an automatic phonetization. Phonetization is the process of representing sounds with phonetic signs. There are two general ways to construct a phonetization process: rule based systems (with rules based on inference approaches or proposed by expert linguists) and dictionary based solutions which consist i...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

The SPPAS participation to Evalita 2011

SPPAS SPeech Phonetization Alignment and Syllabification, is a new tool to automatically produce annotations which includes utterance, word, syllabic and phonemic segmentations from a recorded speech sound and its transcription. This paper describes SPPAS algorithms for phonetization and alignment, and evaluations related to the “Forced Alignment on Spontaneous Speech” task of Evalita 2011.

متن کامل

High capacity steganography tool for Arabic text using 'Kashida'

Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...

متن کامل

Influence de la transcription sur la phonétisation automatique de corpus oraux (what is the impact of the transcription on the phonetization) [in French]

what is the impact of the transcription on the phonetization This paper aims at quantifying the impact of the transcription enrichments on the automatic phonetization of speech. Experiments were carried out on a 7 minutes French corpus including conversational speech, readed speech and a political discourse. Results showed the better the transcription the better the phonetization and that indep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2004